Systems and Methods for Tuning Automatic Speech Recognition Systems

ABSTRACT

A tuning system for tuning a speech recognition system includes a transmitter for sending a user response to a speech recognition system. The user response is based at least in part on a test stimulus that may be generated by the control system. A receiver receives a recognized response from the speech recognition system; this recognized response is based at least in part on the associated user response. An adjustment module adjusts at least one parameter of the speech recognition system based at least in part on at least one of the test stimulus, the associated user response, and the recognized response.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional PatentApplication No. 61/164,451, filed Mar. 29, 2009, the disclosure of whichis hereby incorporated by reference herein in its entirety.

FIELD OF THE INVENTION

This invention relates generally to tuning automatic speech recognitionsystems, and more specifically, self-tuning automatic speech recognitionsystems based on a user's speech model.

BACKGROUND

Multi-channel Cochlear Implant (CI) systems consist of an externalheadset with a microphone and transmitter, a body-worn or ear-levelspeech processor with a battery supply, and an internal receiver andelectrode array. The microphone detects sound information and sends itto the speech processor which encodes the sound information into adigital signal. This information then is sent to the headset so that thetransmitter can send the electrical signal through the skin via radiofrequency waves to the internal receiver located in the mastoid bone ofan implant recipient.

The receiver sends the electrical impulses to the electrodes implantedin the cochlea, thus stimulating the auditory nerve such that thelistener receives sound sensations. Multi-channel CI systems utilize aplurality of sensors or electrodes. Each sensor is associated with acorresponding channel which carries signals of a particular frequencyrange. Accordingly, the sensitivity or amount of gain perceived by arecipient can be altered for each channel independently of the others.

In recent years, CI systems have made significant strides in improvingthe quality of life for profoundly hard of hearing individuals. CIsystems have progressed from providing a minimal level of tonal responseto allowing individuals having the implant to recognize upwards of 80percent of words in test situations. Much of this improvement has beenbased upon improvements in speech coding techniques. For example, theintroduction of Advanced Combination Encoders (ACE), ContinuousInterleaved Sampling (CIS) and HiResolution, have contributed toimproved performance for CI systems, as well as other digital hearingenhancement systems which incorporate multi-channel and/or speechprocessing techniques.

Once a CI system is implanted in a user, or another type of digitalhearing enhancement mechanism is worn by a user, a suitable speechcoding strategy and mapping strategy must be selected to enhance theperformance of the CI system for day-to-day operation. Mapping strategyrefers to the adjustment of parameters corresponding to one or moreindependent channels of a multi-channel CI system or other hearingenhancement system. Selection of each of these strategies typicallyoccurs over an introductory period of approximately six or seven weeksduring which the hearing enhancement system is tuned. During this tuningperiod, users of such systems are asked to provide feedback on how theyfeel the device is performing. The tuning process, however, is not auser-specific process. Rather, the tuning process is geared to theaverage user.

More particularly, to create a mapping for a speech processor, anaudiologist first determines the electrical dynamic range for eachelectrode or sensor used. The programming system delivers an electricalcurrent through the CI system to each electrode in order to obtain theelectrical threshold (T-level) and comfort or max level (C-level)measures defined by the device manufacturers. T-level, or minimumstimulation level, is the softest electrical current capable ofproducing an auditory sensation in the user 100 percent of the time. TheC-level is the loudest level of signal to which a user can listencomfortably for a long period of time.

The speech processor then is programmed, or “mapped,” using one ofseveral encoding strategies so that the electrical current delivered tothe implant will be within this measured dynamic range, between the T-and C-levels. After T- and C-levels are established and the mapping iscreated, the microphone is activated so that the patient is able to hearspeech and sounds in the environment. From that point on, the tuningprocess continues as a traditional hearing test. Hearing enhancementdevice users are asked to listen to tones of differing frequencies andvolumes. The gain of each channel further can be altered within theestablished threshold ranges such that the patient is able to hearvarious tones of differing volumes and frequencies reasonably well.Accordingly, current tuning practice focuses on allowing a user tobecome acclimated to the signal generated by the hearing device.

The above-mentioned tuning technique has been developed to meet theneeds of the average user. This approach has gained favor because theamount of time and the number of potential variables involved indesigning optimal maps for individual users would be too daunting atask. For example, additional complications to the tuning process existwhen users attempt to add subjective input to the tuning of the hearingenhancement system. Using subjective input from a user can add greatercomplexity to the tuning process as each change in the mapping of ahearing enhancement system requires the user to adjust to a new signal.Accordingly, after a mapping change, users may believe that theirability to hear has been enhanced, while in actuality, the users havenot adjusted to the new mapping. As users adjust to new mappings, theusers' hearing may in fact have been degraded.

Tuning methods and systems also have value outside of the cochlearimplant or hearing device space, for example, for speech recognition(“ASR”) systems. ASR systems are often incorporated into suchtechnologies as cellular or other phones (in so-called “voicedial” or“speak-to-talk” systems), computer-based speech-to-text software forword processing, voicemail-to-email conversion systems (that send thecontents of an audio voicemail in a text email format), and automatedphone systems (for example, automated call-in centers for customerservice). A tuning system would allow an ASR system to be tuned to matcha particular speech model of a user, notwithstanding the ASR system'sinitial programming, thus making the technology in which the ASR systemis incorporated useful for a larger number of users.

Since different people pronounce the same words differently, it ishelpful to tune an ASR system to a user's particular speech model. Suchtuning allows ASR systems to be modified such that the system perceivesan appropriate stimulus, notwithstanding any particular speech model ofthe person using the ASR system. Returning to the voicedial orspeak-to-talk example, if the ASR system contained in the cell phone isinitially set at the point of manufacture to recognize stimuli as spokenby a typical speaker, it should operate properly when such a typicalspeaker (i.e., one that suffers from no speech-related impediment) usesthe ASR system. However, if that ASR system is being used by a personwith a speech impediment, it may incorrectly recognize certain stimuli,and may dial the incorrect contact or take other inappropriate action.

What is needed is a tuning system that would allow an ASR system to betuned to match the particular speech model of any user, notwithstandingits initial programming, thus making the device useful for a largernumber of users. Research has been performed regarding such systems fortuning ASR systems, but the resulting tuning systems still displayperformance limitations.

SUMMARY OF THE INVENTION

The present invention, according to one embodiment, provides a solutionfor tuning hearing enhancement systems. The inventive arrangementsdisclosed herein can be used with a variety of digital hearingenhancement systems including, digital hearing aids and cochlear implantsystems. Other exemplary systems in which the inventive arrangementsdisclosed herein can be used include mobile phones configured tocommunicate via a cellular communications network and/or wireless ad hocnetwork. Still another exemplary system is a telephone configured tocommunication via a Voice-over-Internet-Protocol (VolP) network and/oradapted to communicate via a plain old telephone service (POTS) network.These various systems are herein referred to collectively as “hearingdevices.” In accordance with the present invention, rather than usingconventional hearing tests where only tones are used for purposes oftesting a hearing device, speech perceptual tests can be used.

More particularly, speech perceptual tests wherein various words and/orsyllables of the test are representative of distinctive language and/orspeech features can be correlated with adjustable parameters of ahearing device. By detecting words and/or syllables that aremisrecognized by a user, the hearing device can be tuned to achieveimproved performance over conventional methods of tuning hearingdevices.

In other embodiments, the present invention provides a solution forcharacterizing various communications channels and adjusting thosechannels to overcome distortions and/or other deficiencies.

One aspect of the present invention can include a method of tuning adigital hearing device. The method can include playing portions of testaudio, wherein each portion of test audio represents one or moredistinctive features of speech, receiving user responses to playedportions of test audio heard through the digital hearing device, andcomparing the user responses with the portions of test audio. Anoperational parameter of the digital hearing device can be adjustedaccording to the comparing step, wherein the operational parameter isassociated with one or more of the distinctive features of speech.

In another embodiment, the method can include, prior to the adjustingstep, associating one or more of the distinctive features of theportions of test audio with the operational parameter of the digitalhearing device. Each distinctive feature of speech can be associatedwith at least one frequency or temporal characteristic. Accordingly, theoperational parameter can control processing of frequency and/ortemporal characteristics associated with at least one of the distinctivefeatures.

The method further can include determining that at least a portion ofthe digital hearing device is located in a sub-optimal locationaccording to the comparing step. The steps described herein also can beperformed for at least one different language as well as for a pluralityof different users of similar hearing devices.

Another aspect of the present invention can include a method ofevaluating a communication channel. The method can include playing, overthe communication channel, portions of test audio, wherein each portionof test audio represents one or more distinctive features of speech. Themethod can include receiving user responses to played portions of testaudio, comparing the user responses with the portions of test audio, andassociating distinctive features of the portions of test audio withoperational parameters of the communication channel.

In another embodiment, the method can include adjusting at least one ofthe operational parameters of the communication channel according to thecomparing and associating steps. Notably, the communication channel caninclude an acoustic environment formed by an architectural structure, anunderwater acoustic environment, or the communication channel can mimicaviation effects on speech and hearing. For example, the communicationchannel can mimic effects such as G-force, masks, and the Lombard effecton hearing. The steps disclosed herein also can be performed in caseswhere the user exhibits signs of stress or fatigue.

Other embodiments of the present invention can include a machinereadable storage programmed to cause a machine to perform the stepsdisclosed herein as well as a system having means for performing thevarious steps described herein.

In another aspect, the invention relates to a tuning system for tuning aspeech recognition system, the tuning system including a transmitter forsending an associated user response to a speech recognition system,wherein the associated user response is based at least in part on a teststimulus, a receiver for receiving a recognized response from a speechrecognition system, wherein the recognized response is based at least inpart on the associated user response, and an adjustment module foradjusting at least one parameter of a speech recognition system based atleast in part on at least one of the test stimulus, the associated userresponse, and the recognized response. In an embodiment, the tuningsystem includes a test stimulus generation module for sending a teststimulus to a user. In another embodiment, the tuning system includes acomparison module for comparing the associated user response to therecognized response, wherein the comparison module identifies an errorbetween the associated user response to the recognized response. In yetanother embodiment, the tuning system includes a comparison module forcomparing the test stimulus to the recognized response, wherein thecomparison module identifies an error between the test stimulus to therecognized response. In still embodiment, the comparison module comparesan acoustic feature of the test stimulus to an acoustic feature of theassociated user response.

In another embodiment of the above aspect, the acoustic feature includesat least one of a cepstral coefficient and a speech feature. In anotherembodiment, the adjustment module adjusts the at least one parameterbased at least in part on the error. In yet another embodiment, theadjustment module predicts at least a second parameter based at least inpart on the error. In still another embodiment, the tuning systemincludes a storage module for storing at least one of the test stimulus,the associated user response, and the recognized response. In anotherembodiment, the storage module stores a plurality of test stimuli, aplurality of associated user responses, and a plurality of recognizedresponses. In another embodiment, the comparison module compares atleast two of the plurality of test stimuli, the plurality of associateduser responses, and the plurality of recognized responses and generatesa speech model based at least on part on the comparison.

In another aspect, the invention relates to a method of tuning a speechrecognition system, the method including the steps of transmitting anassociated user response to a speech recognition system, wherein theassociated user response is based at least in part on a test stimulus,receiving a recognized response from a speech recognition system,wherein the recognized response is based at least in part on theassociated user response, and adjusting at least one parameter of aspeech recognition system based at least in part on at least one of thetest stimulus, the associated user response, and the recognizedresponse. In an embodiment, the method includes selecting a teststimulus, and sending the test stimulus to a user. In anotherembodiment, the method includes the step of comparing the associateduser response to the recognized response. In yet another embodiment, themethod includes the step of storing the associated user response and theassociated user response. In still another embodiment, the methodincludes the steps of repeating the selecting step, the sending step,the transmitting step, the receiving step, the adjusting step, thecomparing step, and the storing step, and creating an error set.

In another embodiment of the above aspect, the error set includes afirst difference between a first associated user response and a secondrecognized response and a second difference between a second associateduser response and a second recognized response. In another embodiment,the method includes the step of predicting at least a second parameterbased at least in part on the error set. In yet another embodiment, thecomparing step compares an acoustic feature of the associated userresponse to an acoustic feature of the recognized response. In stillanother embodiment, the acoustic feature includes at least one of acepstral coefficient and a speech feature.

In another aspect, the invention relates to an article of manufacturehaving computer-readable program portions embedded thereon for tuning aspeech recognition system, the program portions including instructionsfor transmitting an associated user response to a speech recognitionsystem, wherein the associated user response is based at least in parton a test stimulus, instructions for receiving a recognized responsefrom a speech recognition system, wherein the recognized response isbased at least in part on the associated user response, and instructionsfor adjusting at least one parameter of a speech recognition systembased at least in part on at least one of the test stimulus, theassociated user response, and the recognized response.

BRIEF DESCRIPTION OF THE DRAWINGS

There are shown in the drawings, embodiments which are presentlypreferred, it being understood, however, that the invention is notlimited to the precise arrangements and instrumentalities shown.

FIG. 1A is a schematic diagram illustrating an exemplary system fordetermining relationships between distinctive features of speech andadjustable parameters of a hearing enhancement system in accordance withthe inventive arrangements disclosed herein.

FIG. 1B is a schematic diagram of a cellular phone configured tocommunicate via a cellular communications network and including a systemfor determining relationships between distinctive features of speech andadjustable parameters in order to tune the cellular phone to the hearingrequirements of a particular user in accordance with the inventivearrangements disclosed herein.

FIG. 1C is a schematic diagram of a mobile phone configured tocommunicate via a wireless ad hoc communications network and including asystem for determining relationships between distinctive features ofspeech and adjustable parameters in order to tune the cellular phone tothe hearing requirements of a particular user in accordance with theinventive arrangements disclosed herein.

FIG. 1D is a schematic diagram of a telephone configured to communicatevia a telephony communications network and including a system fordetermining relationships between distinctive features of speech andadjustable parameters in order to tune the telephone to the hearingrequirements of a particular user in accordance with the inventivearrangements disclosed herein.

FIG. 2 is a flow chart illustrating a method of determiningrelationships between distinctive features of speech and adjustableparameters of hearing enhancement systems in accordance with theinventive arrangements disclosed herein.

FIGS. 3A and 3B are tables illustrating exemplary operational parametersof one variety of hearing enhancement system, such as a CochlearImplant, that can be modified using suitable control software.

FIG. 4 is a schematic diagram illustrating an exemplary system fordetermining a mapping for a hearing enhancement system in accordancewith the inventive arrangements disclosed herein.

FIG. 5 is a flow chart illustrating a method of determining a mappingfor a hearing enhancement system in accordance with the inventivearrangements disclosed herein.

FIG. 6A is a flow chart illustrating a method of tuning an ASR system inaccordance with the inventive arrangements disclosed herein.

FIG. 6B is a schematic diagram illustrating an exemplary system fordetermining relationships between distinctive features of speech andadjustable parameters of an ASR system in accordance with the inventivearrangements disclosed herein.

FIG. 6C is a schematic diagram of a method for tuning an ASR system inaccordance with the inventive arrangements disclosed herein.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1A is a schematic diagram illustrating an exemplary system 100 fordetermining relationships between distinctive speech and/or languagefeatures and adjustable parameters of a hearing enhancement system(hearing device) in accordance with the inventive arrangements disclosedherein. As previously noted, such hearing devices can include any of avariety of digital hearing enhancement systems such as cochlear implantsystems, digital hearing aids, or any other such device having digitalprocessing and/or speech processing capabilities. Other hearing devices,in accordance with the invention, can include voice-based communicationsystems such as mobile phones configured to communicate via a cellularcommunications network and/or wireless ad hoc network, as well astelephones configured to communication via aVoice-over-Internet-Protocol (VoIP) network and/or adapted tocommunicate via a plain old telephone service (POTS) network.

More particularly, the system 100 can include an audio playback system(playback system) 105, a monitor 110, and a confusion error matrix (CEM)115. The playback system 105 can audibly play recorded words and/orsyllables to a user having a hearing device to be tuned. The playbacksystem 105 can be any of a variety of analog and/or digital soundplayback systems. According to one embodiment of the present invention,the playback system 105 can be a computer system having digitized audiostored therein. According to still another embodiment, the playbacksystem 105 can include a text-to-speech (TTS) system capable ofgenerating synthetic speech from input or stored text.

While the playback system 105 can simply play aloud to a user recordedand/or generated audio, it should be appreciated that in some cases theplayback system 105 can be communicatively linked with the hearingdevice under test. For example, in the case of selected digital hearingaids and/or cochlear implant systems, an A/C input jack can be includedin the hearing device that allows the playback system 105 to beconnected to the hearing device to play audio directly through the A/Cinput jack without having to generate sound via acoustic transducers.

The playback system 105 can be configured to play any of a variety ofdifferent test words and/or syllables to the user (test audio).Accordingly, the playback system 105 can include or play commonlyaccepted test audio. For example, according to one embodiment of thepresent invention, the well-known Iowa Test Battery, as disclosed byTyler et al. (1986), of consonant vowel, consonant nonsense words can beused. As noted, depending upon the playback system 105, a media such asa tape or compact disc can be played, the test battery can be loadedinto a computer system for playback, or the playback system 105 cangenerate synthetic speech mimicking a test battery.

Regardless of the particular set or listing of words and/or syllablesused, each of the words and/or syllables can represent a particular setof one or more distinctive features of speech. Two distinctive featuresets have been proposed. The first set of features, proposed by Chompskyand Halle (1968), is based upon the articulatory positions underlyingthe production of speech sounds.

Another set of features, proposed by Jakobson, Fant, and Halle (1963),is based upon the acoustic properties of various speech sounds. Theseproperties describe a small set of contrastive acoustic properties thatare perceptually relevant for the discrimination of pairs of speechsounds. More particularly, as will be readily understood by one ofordinary skill, the different distinctive features and their potentialacoustic correlates can be broadly grouped into three categories:fundamental source features; secondary consonantal source features; andresonance features.

The fundamental source features can be further characterized on thebasis of whether the speech sounds are vocalic or non-vocalic. Vocalicspeech corresponds to speech sounds associated with vowels. Accordingly,such speech sounds correspond to a single periodic source, the onset ofthe speech not being abrupt; otherwise the speech sound can becharacterized as non-vocalic. The fundamental source features also canbe characterized on the basis of whether the speech sounds areconsonantal or non-consonantal. Consonantal speech sounds correspond tosounds associated with consonants. Such speech sounds are characterizedby the presence of zeros in the associated spectrum of the sounds.

The secondary consonantal source features can be further characterizedon the basis of whether the speech sounds are interrupted or continuant.Continuant speech sounds, are also characterized as semi-vowels, becauseof their similar sound quality. There is little or no friction withcontinuant speech sounds as the air passes freely out through the mouthof the speaker. A continuant speech sound is produced with an incompleteclosure of the vocal tract. Interrupted speech sounds, by contrast, endabruptly.

The secondary consonantal features can also be characterized on thebasis of whether the speech sounds are checked or unchecked. Checkedspeech sounds, typified by some Far Eastern and African languages, arecharacterized by abrupt termination as opposed to gradual decay, whereasunchecked speech sounds are characterized by gradual decay.Additionally, secondary consonantal features can be characterized asstrident or mellow. The former typically has an irregular waveform,whereas the latter typically has a smooth waveform. A secondaryconsonantal feature characterize as mellow also has a widerautocorrelation function relative to a corresponding normalized stridentfeature. Secondary consonantal features can also be classified accordingto whether the sound is voiced or voiceless.

The resonance features can be further characterized on the basis ofwhether the speech sound is compact or diffuse. A compact feature isassociated with sound having a relative predominance of one centrallylocated format region, whereas a diffuse feature implies sound havingone or more non-central formats. The resonance features can also becharacterized as grave or acute. Speech sounds that are characterized asgrave are low-frequency dominant low frequency, whereas thosecharacterized as acute are high-frequency dominant. Additionally,resonance features can be characterized as flat or plain, depending onwhether the there is a downward shift of some or all formats, typicallyassociated with vowels and a reduction in lip orifice of the speaker.

The resonance features also can be further characterized as sharp orplain, the latter characterizing speech sounds whose second and/orhigher formats rise. Moreover, resonance features can also becharacterized as tense or lax, depending on the amount and duration ofthe energy of the sound. The resonance features also can be classifiedaccording to whether the speech sound is characterized as having a nasalformat or a nasal murmur. The distinctive speech features and theirpotential acoustic correlates are further described in R. Jakobson, G.M. Fant, and M. Halle, PRELIMINARIES TO SPEECH ANALYSIS: THE DISTINCTIVEFEATURES AND THEIR CORRELATES (MIT Press, Cambridge; 1963), which isincorporated herein by reference in its entirety.

The above-described distinctive features of speech sounds and theirpotential acoustic correlates are only examples of the many differentdistinctive features of speech for which a relationship with one or moreadjustable parameters can be determined according to the inventiondescribed herein. Accordingly, regardless of the particular distinctivefeatures of speech of interest in a particular context the invention candetermine relationships between the distinctive features and adjustableparameters for enhancing the capacity of a particular hearing device fora particular user of the device.

It should be appreciated that any of a variety of different features ofspeech can be used within the context of the present invention. Anyfeature set that can be correlated to test words and/or syllables can beused. As such, the invention is not limited to the use of a particularset of speech features and further can utilize a conglomeration of oneor more feature sets.

The monitor system 110 can be a human being who records the various testwords/syllables provided to the user and the user responses. In anotherembodiment, the monitor system 110 can be a speech recognition systemconfigured to speech recognize, or convert to text, user responses. Forexample, after hearing a word and/or syllable, the user can repeat theperceived test audio aloud.

In yet another embodiment, the monitor system 110 can include a visualinterface through which the user can interact. The monitor system caninclude a display upon which different selections are shown. Thus, theplayback of particular test words or syllables can be coordinated and/orsynchronized with the display of possible answer selections that can bechosen by the user. For example, if the playback system 105 played theword “Sam,” possible selections could include the correct choice “Sam”and one or more incorrect choices, such as “sham.” The user chooses theselection corresponding to the user's understanding or ability toperceive the test audio.

In any case, the monitor system 110 can note the user response and storethe result in the CEM 115. The CEM 115 is a log of which words and/orsyllables were played to the user and the user responses. The CEM 115can store both textual representations of test audio and user responsesand/or the audio itself, for example as recorded through a computersystem or other audio recording system. As shown, the audio playbacksystem 105 can be communicatively linked to the CEM 115 so that audiodata played to the user can be recorded within the CEM 115.

While the various components of system 100 have been depicted as beingseparate or distinct components, it should be appreciated that variouscomponents can be combined or implemented using one or more individualmachines or systems. For example, if a computer system is utilized asthe playback system 105, the same computer system also can store the CEM115. Similarly, if a speech recognition system is used, the computersystem can include suitable audio circuitry and execute the appropriatespeech recognition software.

Depending upon whether the monitor system 115 is a human being or amachine, the system 100, for example the computer, can be configured toautomatically populate the confusion error matrix 115 as the testingproceeds. In that case, the computer system further can coordinate theoperation of the monitor system 110, the playback system 105, and accessto the CEM 115. Alternatively, a human monitor 110 can enter testinginformation into the CEM 115 manually.

FIG. 1B is a schematic diagram of a communications environment in whichthe system 100, as described can be employed according one embodiment ofthe invention. The communications environment is a cellularcommunication environment in which the particular hearing device iscellular phone 120. The system 100 is illustratively integrated into thecellular phone 120. The cellular phone 120 can communicate via cellularcommunications network 125 with other communications devices (not shown)that also communicatively link to the cellular communications network.The cellular phone 120 illustratively conveys and receives wirelesscommunications signals via a cellular tower 130 and/or a communicationssatellite 135, the latter also illustratively communicating via wirelesssignals to a ground station 140. Signals between the cellular tower 130and ground station 140 are illustratively exchanged with a server 145 orother application-specific device, as will be readily understood by oneof ordinary skill in the art.

In performing the functions described herein, the system 100 can be usedto improve or optimize the cellular phone 120 so as to accommodate theunique hearing needs of a particular user of the device. Specifically,the system 100 allows the cellular phone to be programmed to present aseries of speech sounds to a user of the cellular phone 120 in which thesystem is integrated. The user can repeat the sounds into the cellularphone 120. The system-presented sounds and the user's responses arecompared using automatic speech recognition techniques based upondistinctive feature analysis, according to the invention. Thedifference—or errors—obtained using two sets of distinctive features canbe used to tune the cellular phone 120; that is, the comparison anddistinctive feature analysis applied by the system, provides a basis bywhich to adjust operation parameters of the device to accommodate theparticular hearing needs of the user. Appropriate tuning can improve theintelligibility of the speech heard by the user of the cellular phone120.

FIG. 1C is a schematic diagram of an alternative communicationsenvironment in which the system 100, as described, can be employedaccording to yet another embodiment of the invention. The illustratedenvironment, according to this embodiment, comprises an ad hoc wirelessnetwork in which a plurality of wireless communications devices 150 a-ccommunicate directly with one another through the exchange of wirelesscommunications signals. At least one of the plurality of devices definesa hearing device 150 a, which according to the present invention,includes the system 100 having the afore-described components of thesystem integrated into the device. Operatively, the system 100 presentssounds and compares the users response, comparing the differences andapplying distinctive feature analysis, the system 100 tunes the mobiledevice 150 a. Thus, again, the system 100 can be used to improve oroptimize the mobile hearing device 150 a so as to accommodate thespecific hearing needs of the user.

FIG. 1D is a schematic diagram of yet a different communicationsenvironment in which the system 100 can be employed according to stillanother embodiment of the invention. Within this environment, thehearing device is a telephone 155, such as a plain old telephone service(POTS) telephone or a VoIP telephone, configured to communicate withother devices (not shown) via a communications network 160 whichcomprises a POTS and/or data communications network. The system 100,whose components and operative features are those described herein,illustratively comprises a separate unit communicatively linked to thetelephone 155. Alternatively, however, the system can be integrated intothe telephone 155. Operatively, the system 100 presents to the user ofthe telephone 155 certain sounds. Differences—or errors—between thedevice-presented sounds and the user's response to the sounds arecompared. Applying distinctive feature analysis, as described herein,the system 100 tunes the telephone 155 so that the telephone isoperatively configured to accommodate the particular hearing needs ofthe telephone user.

FIG. 2 is a flow chart illustrating a method 200 of determiningrelationships between features of speech and adjustable parameters ofhearing devices in accordance with the inventive arrangements disclosedherein. The method 200 can begin in a state where a hearing device wornby a user is to be tuned. In accordance with one aspect of the presentinvention, the user has already undergone an adjustment period of usingthe hearing device. For example, as the method 200 is directed todetermining relationships between distinctive features of speech andparameters of a hearing device, it may be desirable to test a user whohas already had ample time to physically adjust to wearing a hearingdevice.

The method 200 can begin in step 205 where a set of test words and/orsyllables can be played to the user. In step 210, the user'sunderstanding of the test audio can be monitored. That is, the user'sperception of what is heard, production of what was heard, andtransition can be monitored. For example, in one aspect of the presentinvention, the user can repeat any perceived audio aloud. As noted, theuser responses can be automatically recognized by a speech recognitionsystem or can be noted by a human monitor. In another aspect, the usercan select an option from a visual interface indicating what the userperceived as the test audio.

In step 215, the test data can be recorded into the confusion errormatrix. For example, the word played to the user can be stored in theCEM, whether as text, audio, and/or both. Similarly, the user responsescan be stored as audio, textual representations of audio or speechrecognized text, and/or both. Accordingly, the CEM can maintain a log oftest words/syllables and matching user responses. It should beappreciated by those skilled in the art that the steps 205, 210 and 215can be repeated for individual users such that portions of test audiocan be played sequentially to a user until completion of a test.

After obtaining a suitable amount of test data, analysis can begin. Instep 220, each error on the CEM can be analyzed in terms of a set ofdistinctive features represented by the test word or syllable. Thevarious test words and/or syllables can be related or associated withthe features of speech for which each such word and/or syllable is totest. Accordingly, a determination can be made as to whether the userwas able to accurately perceive each of the distinctive features asindicated by the user's response. The present invention contemplatesdetecting both the user's perception of test audio as well as the user'sspeech production, for example in the case where the user responds byspeaking back the test audio that is perceived. Mispronunciations by theuser can serve as an indicator that one or more of the distinctivefeatures represented by the mispronounced word or syllable are not beingperceived correctly despite the use of the hearing device. Thus, eitherone or both methods can be used to determine the distinctive featuresthat are perceived correctly and those that are not.

In step 225, correlations between features of speech and adjustableparameters of a hearing device can be determined. For example, suchcorrelations can be determined through an empirical, iterative processwhere different parameters of hearing devices are altered in serialfashion to determine whether any improvements in the user's perceptionand/or production result. Accordingly, strategies for alteringparameters of a hearing device can be formulated based upon the CEMdetermined from the user's test session or during the test session.

In illustration, studies have shown that with respect to the distinctivefeatures referred to as grave sounds, such sounds are characterized by apredominance of energy in the low frequency range of speech. Acutesounds, on the other hand, are characterized by energy in the highfrequency range of speech. Accordingly, test words and/or syllablesrepresenting grave or acute sounds can be labeled as such. When a wordexhibiting a grave or acute feature is misrecognized by a user, theparameters of the hearing device that affect the capability of thehearing device to accurately portray high or low frequencies of speech,as the case may be, can be altered. Thus, such parameters can beassociated with the misrecognition of acute and/or grave features by auser. Similarly, interrupted sounds are those that have a sudden onset,whereas continuant sounds have a more gradual onset. Users who are notable to adequately discriminate this contrast may benefit fromadjustments to device settings that enhance such a contrast.

According to one embodiment of the present invention, Modeling FieldTheory (MFT) can be used to determine relationships between operationalparameters of hearing devices and the recognition and/or production ofdistinctive features. MFT has the ability to handle combinatorialcomplexity issues that exist in the hearing device domain. MFT, asadvanced by Perlovsky, combines a priori knowledge representation withlearning and fuzzy logic techniques to represent intellect. The mindoperates through a combination of complicated a priori knowledge orexperience with learning. The optimization of the CI sensor map strategymimics this type of behavior since the tuning parameters may havedifferent effects on different users.

Still, other computational methods can be used including, but notlimited to, genetic algorithms, neural networks, fuzzy logic, and thelike. Accordingly, the inventive arrangements disclosed herein are notlimited to the use of a particular technique for formulating strategiesfor adjusting operational parameters of hearing devices based uponspeech, or for determining relationships between operational parametersof hearing devices and recognition and/or perception of features ofspeech.

FIG. 3A is a table 300 listing examples of common operational parametersof hearing devices that can be modified through the use of a suitablecontrol system, such as a computer or information processing systemhaving appropriate software for programming such devices. FIG. 3B is atable 305 illustrating further operational parameters of hearing devicesthat can be modified using an appropriate control system. Accordingly,through an iterative testing process where a sampling of individuals aretested, relationships between test words, and therefore associatedfeatures of speech, and operational parameters of hearing devices can beestablished. By recognizing such relationships, strategies for improvingthe performance of a hearing device can be formulated based upon the CEMof a user undergoing testing. As such, hearing devices can be tunedbased upon speech rather than tones.

FIG. 4 is a schematic diagram illustrating an exemplary system 400 fordetermining a mapping for a hearing device in accordance with theinventive arrangements disclosed herein. As shown, the system 400 caninclude a control system 405, a playback system 410, and a monitorsystem 415. The system 400 further can include a CEM 420 and a featureto map parameter knowledge base (knowledge base) 425.

The playback system 410 can be similar to the playback system asdescribed with reference to FIG. 1. The playback system 410 can playaudio renditions of test words and/or syllables and can be directlyconnected to the user's hearing device. Still, the playback system 410can play words and/or syllables aloud without a direct connection to thehearing device.

The monitor system 415 also can be similar to the monitor system ofFIG. 1. Notably, the playback system 410 and the monitor system 415 canbe communicatively linked thereby facilitating operation in acoordinated and/or synchronized manner. For example, in one embodiment,the playback system 410 can present a next stimulus only after theresponse to the previous stimulus has been recorded. The monitor system415 can include a visual interface allowing users to select visualresponses corresponding to the played test audio, for example variouscorrect and incorrect textual representations of the played test audio.The monitor system 415 also can be a speech recognition system or ahuman monitor.

The CEM 420 can store a listing of played audio along with userresponses to each test word and/or syllable. The knowledge base 425 caninclude one or more strategies for improving the performance of ahearing device as determined through iteration of the method of FIG. 2.The knowledge base 425 can be cross-referenced with the CEM 420,allowing a mapping for the user's hearing device to be developed inaccordance with the application of one or more strategies as determinedfrom the CEM 420 during testing. The strategies can specify whichoperational parameters of the hearing device are to be modified basedupon errors noted in the CEM 420 determined in the user's test session.

The control system 405 can be a computer and/or information processingsystem which can coordinate the operation of the components of system400. The control system 405 can access the CEM 420 being developed in atest session to begin developing an optimized mapping for the hearingdevice under test. More particularly, based upon the user's responses totest audio, the control system 405 can determine proper parametersettings for the user's hearing device.

In addition to initiating and controlling the operation of each of thecomponents in the system 400, the control system 405 further can becommunicatively linked with the hearing device worn by the user.Accordingly, the control system 405 can provide an interface throughwhich modifications to the user's hearing device can be implemented,either under the control of test personnel such as an audiologist, orautomatically under programmatic control based upon the user's resultingCEM 420. For example, the mapping developed by the control system 405can be loaded in to the hearing device under test.

While the system 400 can be implemented in any of a variety of differentconfigurations, including the use of individual components for one ormore of the control system 405, the playback system 410, the monitorsystem 415, the CEM 420, and/or the knowledge base 425, according toanother embodiment of the present invention, the components can beincluded in one or more computer systems having appropriate operationalsoftware.

FIG. 5 is a flow chart illustrating a method 500 of determining amapping for a hearing device in accordance with the inventivearrangements disclosed herein. The method 500 can begin in a state wherea user, wearing a hearing device, is undergoing testing to properlyconfigure the hearing device. Accordingly, in step 505, the controlsystem can instruct the playback system to begin playing test audio in asequential manner.

As noted, the test audio can include, but is not limited to, wordsand/or syllables including nonsense words and/or syllables. Thus, asingle word and/or syllable can be played. As portions of test audio areplayed, entries corresponding to the test audio can be made in the CEMindicating which word or syllable was played. Alternatively, if theordering of words and/or syllables is predetermined, the CEM need notinclude a listing of the words and/or syllables used as the user'sresponses can be correlated with the predetermined listing of testaudio.

In step 510, a user response can be received by the monitor system. Theuser response can indicate the user's perception of what was heard. Ifthe monitor system is visual, as each word and/or syllable is played,possible solutions can be displayed upon a display screen. For example,if the playback system played the word “Sam”, possible selections couldinclude the correct choice “Sam” and an incorrect choice of “sham”. Theuser chooses the selection corresponding to the user's understanding orability to perceive the test audio.

In another embodiment, the user could be asked to repeat the test audio.In that case the monitor system can be implemented as a speechrecognition system for recognizing the user's responses. Still, asnoted, the monitor can be a human being annotating each user's responseto the ordered set of test words and/or syllables. In any event, itshould be appreciated that depending upon the particular configurationof the system used, a completely automated process is contemplated.

In step 515, the user's response can be stored in the CEM. The user'sresponse can be matched to the test audio that was played to illicit theuser response. It should be appreciated that, if so configured, the CEMcan include text representations of test audio and user responses,recorded audio representations of test audio and user responses, or anycombination thereof.

In step 520, the distinctive feature or features represented by theportion of test audio can be identified. For example, if the test wordexhibits grave sound features, the word can be annotated as such. Instep 525, a determination can be made as to whether additional testwords and/or syllables remain to be played. If so, the method can loopback to step 505 to repeat as necessary. If not, the method can continueto step 530. It should be appreciated that samples can be collected anda batch type of analysis can be run at the completion of the testingrather than as the testing is performed.

In step 530, based upon the knowledge base, a strategy for adjusting thehearing device to improve the performance of the hearing device withrespect to the distinctive feature(s) can be identified. As noted, thestrategy can specify one or more operational parameters of the hearingdevice to be changed to correct for the perceived hearing deficiency.Notably, the implementation of strategies can be limited to only thosecases where the user misrecognizes a test word or syllable.

For example, if test words having grave sound features weremisrecognized, a strategy directed at correcting such misperceptions canbe identified. As grave sound features are characterized by apredominance of energy in the low frequency range of speech, thestrategy implemented can include adjusting parameters of the hearingdevice that affect the way in which low frequencies are processed. Forinstance, the strategy can specify that the mapping should be updated sothat the gain of a channel responsible for low frequencies is increased.In another embodiment, the frequency ranges of each channel of thehearing device can be varied.

It should be appreciated that the various strategies can be formulatedto interact with one another. That is, the strategies can be implementedbased upon an entire history of recognized and misrecognized test audiorather than only a single test word or syllable. As the nature of auser's hearing is non-linear, the strategies further can be tailored toadjust more than a single parameter as well as offset the adjustment ofone parameter with the adjusting (i.e. raising or lowering) of another.In step 535, a mapping being developed for the hearing device under testcan be modified. In particular, a mapping, whether a new mapping or anexisting mapping, for the hearing device can be updated according to thespecified strategy.

It should be appreciated, however, that the method 500 can be repeatedas necessary to further develop a mapping for the hearing device.According to one aspect of the present invention, particular test wordsand/or syllables can be replayed, rather than the entire test set,depending upon which strategies are initiated to further fine tune themapping. Once the mapping is developed, the mapping can be loaded intothe hearing device.

Different persons may form otherwise identical sounds and wordsdifferently, due to their particular speech models, which may includespeech impediments, accents based on geographic origin, etc. In such acase, the person may be considered an “imperfect” transmitter (in thattheir speech is “impaired”) and the ASR may be considered a “perfect”receiver. Accordingly, it is desirable to tune an ASR so any user'sspeech model may be effectively recognized by the ASR. Examples ofsystems and methods of tuning an ASR are described below with regard toFIGS. 6A-6C.

A proposed method for self-tuning an ASR system involves testing theuser with a set of stimuli and generating a speech model for the userbased on the difference between each stimulus and his correspondingresponse. This set of stimuli may be open or closed (i.e., limited toparticular sounds that are particularly useful in perceptual testing).The difference between the stimulus and the response is analyzed interms of certain features. The parameters of the ASR system are thentuned so that each time the recognized response is same as the stimulus.

One embodiment of the method associated with an ASR tuning system isdepicted in FIG. 6A. There, if s is a stimulus to a user, r is hisresponse, and r′ is the recognized response from the ASR system, onegoal of the tuning system is to minimize the difference between s andr′. This may be achieved by tuning the parameters of the ASR system,represented by the function ƒ of the difference between the stimulus sand the user's response r. The difference may be analyzed in terms ofacoustic features, such as cepstral coefficients, speech features (suchas grave, nasal, tense, strident, etc.), signal features (e.g.,amplitude, phase, frequency, etc.), or a combination of the above.Additional features that may be analyzed are also contemplated and aredescribed herein.

FIG. 6B is a schematic diagram illustrating an exemplary system 600 fordetermining a mapping for an ASR system in accordance with the inventivearrangements disclosed herein. As shown, the system 600 can include anadjustment module 605, a transmitter 610, and a receiver 615. The system600 further can include a comparison 620 and a feature to map parameterknowledge base (knowledge base) 625.

The transmitter 610 can be similar to the playback system described inFIG. 1. The transmitter 610 can play audio renditions of test wordsand/or syllables and can be directly connected to the ASR.Alternatively, in certain embodiments, the transmitter 610 may be ahuman user who is using the device into which the ASR is incorporated.

The receiver 615 can be similar to the monitor system described inFIG. 1. Notably, the transmitter 610 and the receiver 615 can becommunicatively linked thereby facilitating operation in a coordinatedand/or synchronized manner. For example, in one embodiment, thetransmitter 610 can present a next stimulus only after the response tothe previous stimulus has been recorded. The receiver 615, ifimplemented as the monitor system of FIG. 1, can include a visualinterface allowing users to select visual responses corresponding to theplayed test audio, for example various correct and incorrect textualrepresentations of the played test audio. In alternative embodiments,the receiver may send the recognized response r′ to the comparisonmodule 620.

The comparison module 620 may create a CEM similar to that described inFIG. 1, and can store a listing of played audio along with userresponses to each test word and/or syllable. In alternative embodiments,the comparison module 620 may store any or all of the test stimulus ssent to the user, the user response r, and the ASR recognized responser′. The differences between the stimulus s, the user response r, and theASR recognized response r′ are determined by the comparison module 620,which creates a confusion error matrix. The confusion error matrix maybe refer to, in one instance, the storage of errors between the stimulusand the response, as well as to the storage of errors using equations,logical expressions, stochastic/connectionist models, etc. In certainembodiments, the confusion error matrix compares the presented andproduced phonemes. The matrix permits the calculation of measures thatcapture the accuracy of an ASR's recognized response with respect to thetest stimuli. The data stored in the confusion error matrix might alsobe stored as: (1) algebraic functions (e.g., polynomials); (2) logicalfunctions (e.g., first-order predicate logic); (3) one-dimensionalarrays; (4) multi-dimensional matrices; (5) statistical models (e.g.,Bayesian networks, Cox model, etc.); (6) connectionist models (e.g.,parallel distributed processing networks, associative memory, etc.); or(7) rule-based models (e.g., if-then-else rules). Other modes of datastorage are also contemplated. In general, the confusion error matrixencompasses all such functions/models that permit the calculation ofmeasures to capture a patient's hearing ability.

The ASR may be tested with a closed set of simple nonsense sounds thatare easy for the user to replicate in speech. Alternative testing mayutilize actual words. One type of test may include presenting a set ofstimuli to the user and recording his response corresponding to eachstimulus, as well as the ASR recognized response. Assuming the user hasnormal hearing, the difference between the user response and the ASRrecognized response represents the way the speaks and contributes to hisspeech model.

The speech model is unique to each user. One way to view the speechmodel is a set of points in high-dimensional space where each pointrepresents the error at a particular ASR system parameter setting. Theerror is a function of the differences between each user response andthe recognized response over an entire test. A tuning algorithm studiesthe speech model to predict the most plausible ASR parameters. With oneor more tests, the optimal ASR system parameter settings can be reachedso as to minimize the difference between s and r′.

Returning to FIG. 6B, knowledge base 625 can include one or morestrategies for improving the performance of an ASR system as determinedthrough iteration of the method of FIG. 5. The knowledge base 625 can becross-referenced with the comparison module 620, allowing a mapping forthe ASR system to be developed in accordance with the application of oneor more strategies as determined from the comparison module 620 duringtesting. The strategies can specify which operational parameters of theASR system are to be modified based upon errors noted in the confusionerror matrix determined during a tuning session.

The control system or adjustment module 605 can be a computer and/orinformation processing system which can coordinate the operation of thecomponents of the system 600, as well as adjust the operationalparameters of the ASR system. The adjustment module 605 can access thecomparison module 620 being developed in a test session to begindeveloping an optimized mapping for the ASR system being tuned. Basedupon the user's responses to test stimuli, the adjustment module 605 candetermine proper parameter settings for the ASR system.

In addition to initiating and controlling the operation of each of thecomponents in the system 600, the adjustment module 605 further can becommunicatively linked with the ASR system. Accordingly, the adjustmentmodule 605 can provide an interface through which modifications to theuser's hearing device can be implemented under programmatic controlbased upon the user's resulting confusion error matrix. For example, themapping developed by the adjustment module 605 can be loaded in to thehearing device under test.

While the system 600 can be implemented in any of a variety of differentconfigurations, including the use of individual components for one ormore of the adjustment module 605, the transmitter 610, the receiver615, the comparison module 620, and/or the knowledge base 625, accordingto another embodiment of the present invention, the components can beincluded in one or more computer systems having appropriate operationalsoftware. Alternatively, the system 600 may be incorporated directlyinto an ASR system that is used in a device.

FIG. 6C depicts a method for tuning an ASR system in accordance with theinventive arrangements disclosed herein. The method may be performed bythe tuning system depicted in FIG. 6B or by another embodiment of atuning system. In the depicted method 650, a test stimulus s is firstselected 652, in this case by an adjustment module that also acts as theprimary control system for the tuning system. The adjustment module thensends the stimulus to a user 654. In the depicted embodiment, theadjustment module prompts the user to speak test stimulus, which may bea sound, phoneme, or word, as described above. The user then speaks theappropriate sound to the ASR, which is transmitted to the ASR as anassociated user response r 656. This associated user response r is basedon the test stimulus. Differences between the test stimulus s and theassociated user response r may be due to the user speech model. Thetuning system then receives the ASR recognized response r′ from the ASR658.

Next, the comparing step 660 compares two or more off the test stimuluss, the associated user response r, and the ASR recognized response r′.This comparison determines the differences between the two comparedsignals and creates the confusion error matrix. Thereafter, the signalsmay be stored 662 either in the confusion error matrix or in a separatestorage module. If this is the first comparison 664 of amulti-comparison tuning session, the tuning system may adjust aparameter of the ASR system 666. If it is not the first comparison, theconfusion error matrix may create an error set 668 based on anydifferences between any number of signals. As described above, as moredifferences are identified, the error set becomes more complex, leadingto improved results of tuning the ASR system. Once an error set iscreated, the tuning system may predict additional parameters 670 basedon known conditions, thus leading to increased tuning efficiency. Othertuning methods are also contemplated.

Those skilled in the art will recognize that the inventive arrangementsdisclosed herein can be applied to a variety of different languages. Forexample, to account for the importance of various distinctive featuresfrom language to language, each strategy can include one or moreweighted parameters specifying the degree to which each hearing deviceparameter is to be modified for a particular language. The strategies ofsuch a multi-lingual test system further can specify subsets of one ormore hearing device parameters that may be adjusted for one language butnot for another language. Accordingly, when a test system is started,the system can be configured to operate or conduct tests for an operatorspecified language. Thus, test audio also can be stored and played forany of a variety of different languages.

The present invention also can be used to overcome hearing deviceperformance issues caused by the placement of the device within a user.For example, the placement of a cochlear implant within a user can varyfrom user to user. The tuning method described herein can improveperformance caused, at least in part, by the particular placement ofcochlear implant.

Still, the present invention can be used to adjust, optimize,compensate, or model communication channels, whether an entirecommunication system, particular equipment, etc. Thus, by determiningwhich distinctive features of speech are misperceived or are difficultto identify after the test audio has been played through the channel,the communication channel can be modeled. The distinctive features ofspeech can be correlated to various parameters and/or settings of thecommunication channel for purposes of adjusting or tuning the channelfor increased clarity.

For example, the present invention can be used to characterize theacoustic environment resulting from a structure such as a building orother architectural work. That is, the effects of the acoustic and/orphysical environment in which the speaker and/or listener is located canbe included as part of the communication system being modeled. Inanother example, the present invention can be used to characterizeand/or compensate for an underwater acoustic environment. In yet anotherexample, the present invention can be used to model and/or adjust acommunication channel or system to accommodate for aviation effects suchas effects on hearing resulting from increased G-forces, the wearing ofa mask by a listener and/or speaker, or the Lombard effect. The presentinvention also can be used to characterize and compensate for changes ina user's hearing or speech as a result of stress, fatigue, or the userbeing engaged in deception.

The present invention can be realized in hardware, software, or acombination of hardware and software. The present invention can berealized in a centralized fashion in one computer system, or in adistributed fashion where different elements are spread across severalinterconnected computer systems. Any kind of computer system or otherapparatus adapted for carrying out the methods described herein issuited. A typical combination of hardware and software can be a generalpurpose computer system with a computer program that, when being loadedand executed, controls the computer system such that it carries out themethods described herein.

The present invention also can be embedded in a computer programproduct, which comprises all the features enabling the implementation ofthe methods described herein, and which when loaded in a computer systemis able to carry out these methods. Computer program in the presentcontext means any expression, in any language, code or notation, of aset of instructions intended to cause a system having an informationprocessing capability to perform a particular function either directlyor after either or both of the following: a) conversion to anotherlanguage, code or notation; b) reproduction in a different materialform.

In the embodiments described above, the software may be configured torun on any computer or workstation such as a PC or PC-compatiblemachine, an Apple Macintosh, a Sun workstation, etc. In general, anydevice can be used as long as it is able to perform all of the functionsand capabilities described herein. The particular type of computer orworkstation is not central to the invention, nor is the configuration,location, or design of a database, which may be flat-file, relational,or object-oriented, and may include one or more physical and/or logicalcomponents.

The servers may include a network interface continuously connected tothe network, and thus support numerous geographically dispersed usersand applications. In a typical implementation, the network interface andthe other internal components of the servers intercommunicate over amain bi-directional bus. The main sequence of instructions effectuatingthe functions of the invention and facilitating interaction amongclients, servers and a network, can reside on a mass-storage device(such as a hard disk or optical storage unit) as well as in a mainsystem memory during operation. Execution of these instructions andeffectuation of the functions of the invention is accomplished by acentral-processing unit (“CPU”).

A group of functional modules that control the operation of the CPU andeffectuate the operations of the invention as described above can belocated in system memory (on the server or on a separate machine, asdesired). An operating system directs the execution of low-level, basicsystem functions such as memory allocation, file management, andoperation of mass storage devices. At a higher level, a control block,implemented as a series of stored instructions, responds toclient-originated access requests by retrieving the user-specificprofile and applying the one or more rules as described above.

Communication may take place via any media such as standard telephonelines, LAN or WAN links (e.g., T1, T3, 56 kb, X.25), broadbandconnections (ISDN, Frame Relay, ATM), wireless links, and so on.Preferably, the network can carry TCP/IP protocol communications, andHTTP/HTTPS requests made by the client and the connection between theclient and the server can be communicated over such TCP/IP networks. Thetype of network is not a limitation, however, and any suitable networkmay be used. Typical examples of networks that can serve as thecommunications network include a wireless or wired Ethernet-basedintranet, a local or wide-area network (LAN or WAN), and/or the globalcommunications network known as the Internet, which may accommodate manydifferent communications media and protocols.

While there have been described herein what are to be consideredexemplary and preferred embodiments of the present invention, othermodifications of the invention will become apparent to those skilled inthe art from the teachings herein. The particular methods of manufactureand geometries disclosed herein are exemplary in nature and are not tobe considered limiting. It is therefore desired to be secured in theappended claims all such modifications as fall within the spirit andscope of the invention. Accordingly, what is desired to be secured byLetters Patent is the invention as defined and differentiated in thefollowing claims, and all equivalents.

1. A tuning system for tuning a speech recognition system, the tuningsystem comprising: a transmitter for sending an associated user responseto a speech recognition system, wherein the associated user response isbased at least in part on a test stimulus; a receiver for receiving arecognized response from a speech recognition system, wherein therecognized response is based at least in part on the associated userresponse; and an adjustment module for adjusting at least one parameterof a speech recognition system based at least in part on at least one ofthe test stimulus, the associated user response, and the recognizedresponse.
 2. The tuning system of claim 1, further comprising a teststimulus generation module for sending a test stimulus to a user.
 3. Thetuning system of claim 1, further comprising a comparison module forcomparing the associated user response to the recognized response,wherein the comparison module identifies an error between the associateduser response to the recognized response.
 4. The tuning system of claim1, further comprising a comparison module for comparing the teststimulus to the recognized response, wherein the comparison moduleidentifies an error between the test stimulus to the recognizedresponse.
 5. The tuning system of claim 3, wherein the comparison modulecompares an acoustic feature of the test stimulus to an acoustic featureof the associated user response.
 6. The tuning system of claim 5,wherein the acoustic feature comprises at least one of a cepstralcoefficient and a speech feature.
 7. The tuning system of claim 3,wherein the adjustment module adjusts the at least one parameter basedat least in part on the error.
 8. The tuning system of claim 7, whereinthe adjustment module predicts at least a second parameter based atleast in part on the error.
 9. The tuning system of claim 3, furthercomprising a storage module for storing at least one of the teststimulus, the associated user response, and the recognized response. 10.The tuning system of claim 9, wherein the storage module stores aplurality of test stimuli, a plurality of associated user responses, anda plurality of recognized responses.
 11. The tuning system of claim 10,wherein the comparison module compares at least two of the plurality oftest stimuli, the plurality of associated user responses, and theplurality of recognized responses and generates a speech model based atleast on part on the comparison.
 12. A method of tuning a speechrecognition system, the method comprising the steps of: transmitting anassociated user response to a speech recognition system, wherein theassociated user response is based at least in part on a test stimulus;receiving a recognized response from a speech recognition system,wherein the recognized response is based at least in part on theassociated user response; and adjusting at least one parameter of aspeech recognition system based at least in part on at least one of thetest stimulus, the associated user response, and the recognizedresponse.
 13. The method of claim 12, further comprising the steps of:selecting a test stimulus; and sending the test stimulus to a user. 14.The method of claim 13, further comprising the step of comparing theassociated user response to the recognized response.
 15. The method ofclaim 14, further comprising the step of storing the associated userresponse and the associated user response.
 16. The method of claim 15,further comprising the steps of: repeating the selecting step, thesending step, the transmitting step, the receiving step, the adjustingstep, the comparing step, and the storing step; and creating an errorset.
 17. The method of claim 16, wherein the error set comprises a firstdifference between a first associated user response and a secondrecognized response and a second difference between a second associateduser response and a second recognized response.
 18. The method of claim17, further comprising the step of predicting at least a secondparameter based at least in part on the error set.
 19. The method ofclaim 14, wherein the comparing step compares an acoustic feature of theassociated user response to an acoustic feature of the recognizedresponse.
 20. The method of claim 19, wherein the acoustic featurecomprises at least one of a cepstral coefficient and a speech feature.21. An article of manufacture having computer-readable program portionsembedded thereon for tuning a speech recognition system, the programportions comprising: instructions for transmitting an associated userresponse to a speech recognition system, wherein the associated userresponse is based at least in part on a test stimulus; instructions forreceiving a recognized response from a speech recognition system,wherein the recognized response is based at least in part on theassociated user response; and instructions for adjusting at least oneparameter of a speech recognition system based at least in part on atleast one of the test stimulus, the associated user response, and therecognized response.